Discovering Hypernymy Relations using Text Layout
نویسندگان
چکیده
Hypernymy relation acquisition has been widely investigated, especially because taxonomies, which often constitute the backbone structure of semantic resources are structured using this type of relations. Although lots of approaches have been dedicated to this task, most of them analyze only the written text. However relations between not necessarily contiguous textual units can be expressed, thanks to typographical or dispositional markers. Such relations, which are out of reach of standard NLP tools, have been investigated in well specified layout contexts. Our aim is to improve the relation extraction task considering both the plain text and the layout. We are proposing here a method which combines layout, discourse and terminological analyses, and performs a structured prediction. We focused on textual structures which correspond to a well defined discourse structure and which often bear hypernymy relations. This type of structure encompasses titles and sub-titles, or enumerative structures. The results achieve a precision of about 60%.
منابع مشابه
Supervised Learning of German Qualia Relations
In the last decade, substantial progress has been made in the induction of semantic relations from raw text, especially of hypernymy and meronymy in the English language and in the classification of noun-noun relations in compounds or other contexts. We investigate the question of learning qualia-like semantic relations that cross part-of-speech boundaries for German, by first introducing a han...
متن کاملReducing VSM data sparseness by generalizing contexts: application to health text mining
Vector Space Models are limited with low frequency words due to few available contexts and data sparseness. To tackle this problem, we generalize contexts by integrating semantic relations acquired with linguistic approaches. We use three methods that acquire hypernymy relations on a EHR corpus. Context Generalization obtains the best results when performed with hypernyms, the quality of the re...
متن کاملResolving and Generating Definite Anaphora by Modeling Hypernymy using Unlabeled Corpora
We demonstrate an original and successful approach for both resolving and generating definite anaphora. We propose and evaluate unsupervised models for extracting hypernym relations by mining cooccurrence data of definite NPs and potential antecedents in an unlabeled corpus. The algorithm outperforms a standard WordNet-based approach to resolving and generating definite anaphora. It also substa...
متن کاملOntology Population using Corpus Statistics
This paper presents a combination of algorithms for automatic ontology building based mainly on lexical cooccurrence statistics. We populate an ontology with hypernymy links, thus we refer more specifically to a taxonomy of lexical units (nouns organized by hypernymy relations) rather than an ontology of formally defined concepts. A set of combined statistical procedures produce fragments of ta...
متن کاملQuery based Text Document Clustering using its Hypernymy Relation
Clustering of text can be organized in an unsupervised manner. In this paper, Text document clustering is done based on query and its semantic relation. The method utilizes hypernymy to identify its relation. It was detected by using the Word Net. It act as background knowledge of the Query and provides its synonymic terms. This paper proposed the new term-document matrix called Query based doc...
متن کامل